Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor interfaces and engine code #153

Merged
merged 86 commits into from
Sep 14, 2020
Merged

Conversation

wild-endeavor
Copy link
Contributor

@wild-endeavor wild-endeavor commented Aug 11, 2020

TL;DR

Changes are:

  • Formalizing the SdkTask, SdkWorkflow, and SdkLaunchPlan classes as control plane smart objects.
  • Removal of the engine layer as it pertains to accessing the control plane from those smart objects. For instance, SdkTask would directly access the Flyte Admin client to fetch a task instead of going through the FlyteTask wrapper layer.
  • Introduction of a new SdkRunnableWorkflow class mean to encapsulate the local workflow object (i.e. you only have this object when you have the code locally available, and imported into the Python process). Despite the name, it has no bearing on whether or not a workflow is runnable, either locally or otherwise.
    • This is in a separate file and user-space level objects like Output was also moved, but we don't have to if it breaks people's imports.
    • The original constructor to SdkWorkflow has changed. Hopefully not too many people are using it.
  • Cleaning up the abstract base classes by creating a Registerable ABC meant to encapsulate control plane capability and a LocalEntity ABC meant to encapsulate local-only functionality.
  • Removing all the future stuff. We only used stuff that is mandatory in 3.0 and we've decided we're no longer supporting Python 2.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Please see the above.

Tracking Issue

flyteorg/flyte#226

Follow-up issue

flyteorg/flyte#226

self,
default_inputs: Dict[str, _promise.Input] = None,
fixed_inputs: Dict[str, Any] = None,
schedule=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we introduce type hints here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it here, but it's missing from so many places... let's address in future PRs.

return self._has_registered


class HasDependencies(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are just modeling a graph - albeit poorly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. dependency tracking in flytekit is confusing. We have nodes, which have upstream nodes, but we also have tasks/workflows/launchplans, which also keep track of upstream entities.

I'd like to get to the point where we only use nodes to keep track of registration order and such. This should make the topological sort easier to reason about as well.

It's not quite just the workflow graph though... if you have a workflow that statically references another workflow, somehow you'll need to know to process the second workflow first. same for default launch plans. we'll probably have to build up a registration graph which contains these inter-entity dependencies. This is effectively what the entity level upstream entity array keeps track of now, but i feel like this is cleaner. anyways, not this pr.

@@ -199,7 +210,14 @@ def fetch_latest(cls, project, domain, name):
:rtype: SdkTask
"""
named_task = _common_model.NamedEntityIdentifier(project, domain, name)
admin_task = _engine_loader.get_engine().fetch_latest_task(named_task)
client = _flyte_engine._FlyteClientManager(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is increasing the blast radius of creating the client manager. I see this code fragment repeated again again and again.

I think
fetch register etc should take in a context that has the client in it. But I understand that is a large change. In the interim have a helper method that creates the client, instead of passing the url and insecure flag repeatedly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return execution_data.full_inputs

if execution_data.inputs.bytes > 0:
with _common_utils.AutoDeletingTempDir() as t:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to get rid of the concept of AutoDeletingTempDir in the framework code. I think this should be a user decision

@kumare3
Copy link
Contributor

kumare3 commented Sep 10, 2020

This is a huge change and I love it! A few comments, mostly to push for next steps. I dont think any of the feedback needs action right away.

EngHabu
EngHabu previously approved these changes Sep 13, 2020
EngHabu
EngHabu previously approved these changes Sep 14, 2020
import importlib
import pkgutil

import six

from flytekit.common.exceptions import user as _user_exceptions
from flytekit.common.local_workflow import SdkRunnableWorkflow as _PythonWorkflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _ name is different? Intentional?

@@ -112,11 +114,13 @@ def workflow(nodes, inputs=None, outputs=None, cls=None, on_failure=None):
:param dict[Text,Output] outputs: [Optional] A dictionary of output descriptors for a workflow.
:param T cls: This is the class that will be instantiated from the inputs, outputs, and nodes. This will be used
by users extending the base Flyte programming model. If set, it must be a subclass of
:py:class:`flytekit.common.workflow.SdkWorkflow`.
:py:class:`flytekit.common.local_workflow.PythonWorkflow`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong type?

@wild-endeavor wild-endeavor merged commit fd57a48 into master Sep 14, 2020
max-hoffman pushed a commit to dolthub/flytekit that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants